Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

BiRe-ID: Binary Neural Network for Eﬃcient Person Re-ID

155

∂L ^F

MSE

∂wi

= μ(ai −a^∗

H⁾^∂^aⁱ

∂wi

I(i ∈L),

(6.23)

where I is an indicator function deﬁned as

I(i ∈L) =

i −th layer is supervised with FR −GAL

else

(6.24)

As mentioned above, we employ several FR-GALs in the training process. Therefore, I(i ∈L)

denotes whether i-th layer is supervised with FR-GAL. Note that FR-GAL is only used to

supervise the low-level feature. Thus, no gradient is aroused to the high-level feature.

In this way, we calculate every speciﬁc gradient of wi as

wi ←wi −η1δwi,

(6.25)

where η1 is a learning rate.

Update αi: We further update the learnable matrix αi with wi ﬁxed. Let δαi be the

gradient of αi, we then have

δαi = ^∂^L

∂αi

= ^∂^L^S

∂αi

+ ^∂^L^K

Adv

∂αi

+ ^∂^L^K

MSE

∂αi

+ ^∂^L^F

Adv

∂αi

+ ^∂^L^F

MSE

∂αi

(6.26)

and

αi ←αi −η2δαi,

(6.27)

where η2 is the learning rate for αi. Furthermore,

∂L ^K

Adv

∂αi

= −

(1 −D(αi ◦b^wⁱ; WD))

∂D

∂(αi ◦b^wⁱ)^b^wⁱ^.

(6.28)

∂L ^K

MSE

∂αi

= −λ(wi −αi ◦b^wⁱ)b^wⁱ,

(6.29)

∂L ^F

Adv

∂αi

= −

1 −D(ai; WD)

∂D

∂ai

∂αi

I(i ∈L),

(6.30)

∂L ^F

MSE

∂αi

= μ(ai −a^∗

H⁾^∂^aⁱ

∂αi

I(i ∈L),

(6.31)

Update pi: Finally, we update the other parameters pi with wi and αi ﬁxed. δpi is deﬁned

as the gradient of pi as

δpi = ^∂L^S

∂pi

(6.32)

pi ←pi −η3δpi,

(6.33)

where η3 is the learning rate for other parameters. These derivations demonstrate that the

reﬁning process can be trained from the beginning to the end. The training process of our

BiRe-ID is summarized in Algorithm 13. We independently update the parameters while

ﬁxing other parameters of convolutional layers to enhance the variation of the feature maps

in every layer. In this way, we can accelerate the convergence of training and fully explore

the potential of our 1-bit networks.